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Abstract 

The  storage/directory  method  described  in  this  paper  allows  for 
flexibility  in  modification  of  a  rapidly  growing  directory  while  main- 
taining a  reasonable  number  of  searches  necessary  to  locate  items  whose 
keys  are  stored  in  the  directory.   A  mathematical  analysis  shows  the 
average  number  of  searches  to  be  of  the  same  order  of  magnitude  as  for  a 
binary-chop  method  of  table  look-up  while  the  directory  itself  need  not 
be  reordered  to  accomodate  new  entries.   The  analysis  is  followed  by  a 
series  of  procedures  and  suggestions  for  procedures  which  should  prove 
useful  when  employing  this  method. 

Introduction 

Two  classical  directory  schemes  are  the  binary-chop  and  sequen- 
tial-link pointer*  methods.   The  method  to  be  described  which  is  essen- 
tially that  of  Hibbard  [1],  embodies  many  of  the  desirable  features  of 
these  two  schemes,  while  avoiding  many  of  the  disadvantages.   This  method 
or  storage  directory  system  will  be  referred  to  as  the  Binary-Chop  Pointer 
method.   As  will  be  shown  later  in  this  paper  the  BCP  method  reduces  to 
either  the  binary-chop  or  sequential  pointer  linked  directory  methods  at 
either  end  of  its  functional  spectrum. 

General 

The  binary-chop  search  method  is  a  highly  efficient  directory 
look-up  procedure.   Given  a  directory  of  n  elements  in  collating  sequence, 
this  scheme  locates  a  key  being  searched  for  in,  on  the  average,  log  n-1 
searches.   The  main  disadvantages  of  this  type  of  table  are  the  necessary 
reordering  of  the  entries  as  additions  are  made  and  the  necessity  of  using 
contiguous  storage  space  for  the  table. 

The  sequential-pointer  linked  directory  allows  for  rapid  modifi- 
cation and  use  of  non-contiguous  storage  by  merely  setting  pointers.   On 
the  other  hand,  the  search  of  a  table  of  this  type  is  necessarily  a  linear 


*i.e.  a  directory  in  which  a  pointer  in  each  entry  indicates  the  next 
entry. 
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process  and  hence  requires,  on  the  average,  p  search  for  a  table  of  n 
entries.   A  BCP  table  allows  for  addition  of  entries  with  no  reordering, 
and  does  not  require  contiguous  storage.   The  average  number  of  searches, 
however,  is  small,  being  less  than  I.38  logpn. 

Format  of  the  Table 

A  BCP  table  is  made  up  of  n  entries,  each  entry  consisting  of 
the  following  items: 

1)  the  key  on  which  the  search  is  made. 

2 )  a  low  pointer  or  L0P . 

3)  a  high  pointer  or  HIP. 

k)      the  argument  which  is  associated  with  the  key  in  that  entry. 

The  LjfrP  in  an  entry  refers  (points)  to  an  entry  whose  key  is 
low  with  respect  to  the  present  entry's  key.   Similarly  the  HIP  in  an 
entry  points  to  an  entry  whose  key  is  high  with  respect  to  the  present 
key.   Thus,  an  examination  of  a  particular  key  in  the  table  during  a 
search  operation  results  in  a  ternary  decision  branch: 

i)   an  equal  compare  in  which  case  the  argument  corresponding 
to  that  entry  is  retrieved. 
ij )   the  key  being  search  for  it  is  lower  than  the  one  being 
examined,  in  which  case  the  Lf)P   indicates  the  location 
of  the  next  key  to  be  examined, 
iii)   the  key  being  searched  for  it  is  higher  than  the  one  being 
examined  in  which  case  the  HIP  indicates  the  location  of 
the  next  key  to  be  examined. 

Example: 

Assume  that  the  following  five  names  appear  as  keys  in  BCP  table: 

MAC,  B0B,  PETE,  SAM,  Jj£>E 
The  table  might  be: 
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Location 

L0P 

HIP 

Argument 

1 

mac: 

2 

3 

XX 

2 

B0B 

5 

XX 

3 

PETE 

k 

XX 

k 

SAM 

XX 

5 

JjbE 

XX 

If  one  desires  to  insert  an  entry  whose  key  is  MIKE,  the 
following  operations  occur: 

1)  Insert  the  entry  in  an  available  storage  location  (in  this 
case  in  location  6)  with  the  L^P  and  HIP  blank. 

2)  Compare  MIKE  to  the  key  at  location  1.   Examine  the  appro- 
priate pointer  (in  this  case  it  is  the  HIP,  since  MIKE  > 
MAC).   If  it  is  not  blank,  continue  the  search  at  the  indi- 
cated location.   (i.e.  now  make  comparison  at  location  3«) 

3)  When  the  examined  pointer  is  blank,  the  location  of  the  key 
being  added  is  then  inserted.   In  this  case,  the  resultant 
table  is: 


Locat 

ion 

Key 

Lj£)P 

HIP 

Ar 

gument 

1 

MAC 

2 

3 

XX 

2 

BOB 

5 

XX 

3 

PETE 

6 

h 

XX 

k 

SAM 

XX 

5 

jf)E 

XX 

6 

MIKE 

XX 

.  Suppose  one  started  with  no  entries  in  the  table.   If  one  then 
inserted  MAC,  followed  by  BOB,  PETE,  SAM,  J0E,  MIKE,  in  that  order,  the 
table  above  will  result.   However  if  one  were  to  insert  the  entries  in 
the  order  BOB,  MAC,  J0E,  SAM,  PETE,  one  would  obtain  the  table: 


Locat 

ion 

Key 

LOP 

hi: 

1 

BOB 

2 

2 

MIKE 

3 

5 

3 

MAC 

It 

k 

JjbE 

5 

SAM 

6 

6 

PETE 

Note  that  the  structure  of  a  BCP  table  is  highly  dependent  upon 
the  ordering  of  keys  as  they  are  entered. 

The  algorithm  used  in  determining  the  location  of  the  pointer  to 
the  new  entry  in  the  last  example  is  the  method  used  to  search  the  table 
to  retrieve  an  entry  already  in  the  table.   An  equal  compare  will  result 

when  the  desired  entry  is  encountered.   An  example  of  this  basic  algo- 

T2l 
nthm  was  used  by  Knowlton    .   A  convenient  form  for  representing  the 

BCP  table  is  a  tree  in  which  each  node  corresponds  to  an  entry  in  the  table 

From  each  node  there  may  be  both  left  and  right  arrows,  corresponding  to 

L0P  and  HIP's  respectively.   The  tree  representation  of  the  last  table  is 

shown  in  Figure  1.   A  node  X  which  has  a  pointer  to  a  node  Y  is  said  to 

subsume  node  Y  and  all  nodes  subsumed  by  Y. 

Analysis 

A  measure  of  the  worth  of  a  table  searching  algorithm  is  the 
average  time  required  to  locate  an  item  in  the  table.   Here,  this  time 
is  directly  proportional  to  the  average  number  of  entries  examined  in 
obtaining  a  required  entry.   For  a  given  table  this  is  the  average  of  the 
search  times  to  each  node  in  the  tree  that  the  table  represents.   A  search 
time  of  1  is  assumed  to  the  top  node,  2  to  each  node  subsumed  immediately 
below  this,  and  so  on;  a  table  with  no  nodes  will  be  defined  to  have  a 
search  time  of  0  to  each  element,  and  hence  an  average  search  time  of  0. 
The  analysis  is  based  on  the  notion  that  for  a  table  of  n+1  entries  there 
are  exactly  (n+l)l  different  orders  in  which  the  entries  can  be  made.   For 
simplicity  and  without  loss  of  generality  it  is  assumed  that  values  of  the 
keys  of  the  entries  are  the  integers  1  through  n+1.   Any  given  order  of 
elements  in  the  table  will  be  assumed  equally  likely  i.e.,  with  a  proba- 


bility  of  — —  . 

(n+l)! 

Each  ordering  of  entries  defines  exactly  the  tree  structure  of 
the  table.   The  (n+l)l  resulting  trees  are  partitioned  according  to  which 
of  the  (n+l)  integers  is  entered  first  and  hence  determines  its  top  node. 
Clearly  there  are  (n)l  trees  in  each  block  of  this  partitioning;  each  block 
is  equal  likely. 

The  top  node  in  the  tree  determines  the  number  of  elements  to 

the  left  of  the  top,  and  the  number  to  the  right.  For  example:   for 

tables  consisting  of  the  5  keys  1,  2,    3?  ^,    5>  if  ^   is  the  top  node,  the 

subtree  to  the  left  of  k   will  have  1,  2,  and  3  in  it,  and  that  to  the  right 
will  have  the  node  5- 

In  any  given  block,  say  the  (k+l)th,  all  trees  can  be  character- 
ized by  the  form  shown  in  Figure  2;    i.e.  the  top  node  is  k+1,  there  are 
the  k  nodes  for  1  to  k  in  the  subtree  to  the  left  and  the  (n-k)  nodes  for 
k+2  to  n+l  in  the  subtree  to  the  right. 

Within  the  block  we  will  have  all  trees  of  this  form.  Hence,  on 
the  average,  the  search  time  to  an  element  in  the  left  subtree  will  be 
S(k)+1  where  S(k)  is  the  average  search  time  to  an  element  in  a  table  with 
k  elements.  There  will  be  k  such  elements.  The  search  time  to  an  element 
in  the  right  subtree  will  be  S(n-k)+l.  There  will  be  (n-k)  such  elements. 
The  search  time  to  the  top  element,  k+1,  is  1. 

Let  the  average  search  time  over  the  (k+l)th  block  of  the  set  of 

all  trees  having  n+l  elements  be  denoted  S,  _,(n+l). 
to  k+1 

S   fn+1)  -  k[S(k)+1]  +  (n"k)  [S(n-k)+l]  +  1 
k+1^    '  n+l 

=  kS(k)  +  (n-k)  S(n-k)+n  +  1 
n+l 

Now  to  obtain  the  average  search  time  over  all  blocks  we  take 
an  average  (unweighted,  as  all  blocks  are  equally  likely)  of  S  (n+l). 
Thus,  we  arrive  at  the  basic  recursion  formula: 
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S(n+1)  =  


n+1 
n 


y        kS(k)  +  (n-k)  S(n-k)  +  n+1 
k=0  n+1 


n+1 


kZQ  KS(R)  +  (n-k)  S(n-R)   +1  (l) 

b^n+lj  -  — 

(n+1)2 


As  the  summation  k  runs  from  0  to  n,  kS(k)  runs  from  0S(0)  to  nS(n)  and 
(n-k)  S(n-k)  runs  from  nS(n)  to  0S(0).   Also  0S(0)  =  0.   This  leads  to 

n 


S(n+1)  = 


2  ,Z1  kS(k) 

k=l    v  '      +1 


(2) 


(n+1)2 


This  formula  involves,  in  the  calculation  of  S(n+l)  the  use  of  the  values 
S(l)  ...  S(n).  A  more  useful  formula  involving  only  S(n)  can  be  obtained 
as  follows: 


n 


Or: 


-,  ,,      !kSi  kS<k>  +i 

S(n+1)  =  5 

(n+1)2 


2   -  n   - 

(n+1)   [S(n+1)-1]  =  2  ^  kS(k) 


(3) 


n-1 


=  2nS(n)  +2   7  kS(k) 

k=l 


But  from  (3)  with  n  replaced  by  n-1 


n-1  2 


2  kEx  kS(k)  =  (n)c  [S(n)-1] 


2   -  -     2   - 

Hence,        (n+l)   [S(n+l)-l]  =  2nS(n)+n  [S(n)-1] 

S(n+1)-1  =  gnS(n)+n2  S(n)-n2 
(n+1)2 


S(n+1)  =  (n+2)n  S(n)-n2  +  (n+l)2 

(n+l)2 

=  (n+2)n  S(n)  +  2n  +  1 

7—2  (M 

(n+l) 

This  formula  can  also  be  written: 

t      .  x  -,   .  v    (n+2)n  S(n)+2n  +  1 

(n+l)  S(n+1)  =  ■» - y—~ — 

v    '      v    '  n+l 

And  clearly  S(n)  behaves  as  log  n  for  n  sufficiently  large. 

Now  knowing  that  S(n)  =  k  In  n  in  the  limit,  the  value  of  k 
can  be  determined  from  (k) .      Clearly: 

k  ln(n+l)  a  {&£&     k  ln(n)  +  2n+l 
(n+l)2  (n+ir 


As  n  -»  <x , 


Also 


Hence 


(n+2)n 

2 

(n+l)2 


ln(n+l)  =  ln(n)  +  ln(l  +  \) 

i  f    \        1        2 

=  ln(n)  +  -  +  -2  + 


,  _  ,  v    k   2k       ~  .  .  /  v    2n+l 
k  ln(n)  +  —  +  —ij  +    ...  =  k  ln(n)  +  „ 

n   n  (n+l)d 

Comparing  first  order  terms  in: 

k   2k         2   3 

n   n2         n   ^ 

gives,  in  the  limit  as  n  -*  og,  k  -*   2 . 

However,  by  use  of  the  formula  exactly  as  calculated  by  a  computer,  we 
obtain  the  results  shown  in  Figure  3-   Hence,  for  tables  in  the  range  of 

7 

normal  usage,  say  up  to  10  entries, 

k  <  1.83,  and  S(n)  <  1.83  ln(n)  =  1.31  log^O). 
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Deh'tioiis  in  a  BCP  Table 

Deletion  of  an  item  from  a  basic  BCP  table  is  not  always  a 
trivial  operation.   A  node  can  have  one  of  three  pointer  structures  within 
the  BCP  table.   These  are  represented  graphically  in  Figure  k (a),  -  ^(c), 
where  X  is  the  node  to  be  deleted. 

The  deletion  in  case  a)  is  merely  the  removal  of  the  pointer  to 
node  X.   In  case  b),  the  pointer  to  node  X  is  reset  to  point  to  node  Y, 
and  is  shown  schematically  in  Figure  U(d). 

In  either  case,  the  entry  corresponding  to  the  deleted  node  X 
will,  in  general,  occur  somewhere  imbedded  in  the  actual  BCP  table.   This 
space  is  now  freed  and  may  be  used  for  a  subsequent  entry  (addition)  in 
the  table. 

Case  c)  is  somewhat  more  difficult  to  handle  because  there 
exists  a  single  pointer  to  X  but  two  from  X.   Clearly  this  single  pointer 
cannot  be  made  to  point  to  two  different  nodes  at  the  same  time. 

One  possible  solution  would  be  to  reset  this  lone  pointer  to 
point  to  one  of  the  two  nodes  subsumed  by  X,  say  the  left  (or  Y) .   A 
pointer  to  the  other  subsumed  node  would  be  established  at  the  first 
node  not  having  a  HIP  in  use  which  is  reached  by  successively  following 
HIP's  from  Y.   (Note  that  any  other  form  of  linking  node  Z  from  a  node 
subsumed  by  Y  will  yield  an  incorrectly  structured  BCP  table.   In 
particular,  when  there  exists  any  other  linkage  to  Z,  the  Z  entry  is 
irretrievable.)   This  solution  is  undesirable  because  the  search  path 
to  Z  and  any  node  subsumed  by  it  must  now  go  through  Y  and  in  general, 
other  nodes  subsumed  by  Y. 

As  an  example,  consider  the  representation  of  a  BCP  table  shown 
in  Figure  5(a)  where  the  number  of  searches  for  each  node  are  listed 
below  the  respective  nodes. 

It  is  desired  to  delete  node  e.   If  the  pointer  is  now  changed 
to  indicate  the  left  node  subsumed  by  e,  the  resultant  situation  is 
shown  in  Figure  5(b).   For  the  case  in  which  the  pointer  is  reset  to 
indicate  the  right  node  first,  the  table  is  represented  by  Figure  5(c). 
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Figure  5a 


Figure  5b 


Figure  5c 


Figure  5d 
( *  i  nd  i  cates  null   node  ) 
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A  more  acceptable  solution  to  the  problem  is  reached  by- 
allowing  the  "image"  of  the  deleted  node  to  remain  for  comparison 
purposes.   The  entry  in  the  table  corresponding  to  this  node  is  flagged 
to  indicate  that  it  no  longer  exists  as  a  standard  entry  in  the  table, 
but  rather  is  there  only  to  allow  the  search  algorithm  to  continue  to 
reach  nodes  subsumed  by  it.   Such  a  node  is  called  a  null  node .   The 
tree  representing  such  a  deletion  is  shown  in  Figure  5(d).   Note  that 
a  null  node  may  be  entirely  deleted  if  one  or  both  of  the  two  nodes 
subsumed  by  it  becomes  deleted  through  some  sequence  of  later  dele- 
tions.  When  this  occurs,  the  null  node  is  handled  as  in  the  earlier 
non-troublesome  cases  (cases  a  and  b  above). 

It  is  also  possible  to  re-utilize  the  space  in  the  table 
containing  a  null  node  entry  when  a  subsequent  addition  to  the  table 
falls  anywhere  in  the  allowable  range.   The  authors  have  written  an 
algorithm  which  does  this  in  a  manner  which  is  essentially  no  more 
complicated  than  the  standard  search  algorithm. 

Balance  of  a  BCP  Table 

The  structure  of  a  BCP  table  varies  between  two  extremes. 
These  endpoints  are  referred  to  as  best  case  and  worst  case  conditions. 
The  best  case  condition  occurs  when  S  is  a  minimum  for  a  given  number 
of  entries  in  the  table  and  is  the  same  (with  respect  to  search  time) 
as  a  binary-chop  method.   This  case  is  equivalent  to  the  method  de- 
scribed  by  Brooks  and  Iverson    .   A  schematic  representation  for  such 
a  table  containing  seven  keys  (the  integers  1  -  7)  is  shown  in 
Figure  6(a).   Note  that  the  tree  corresponding  to  this  condition  is 
unique  only  when  the  number  of  entries  in  the  table  is  one  less  than 
a  power  of  two. 

The  worst-case  condition  is  satisfied  when  S  is  a  maximum  for 
a  given  number  of  entries  in  the  table.   The  search  time  for  the  table 
in  this  case  is  the  same  as  for  a  sequential-linked-pointer  directory. 
Such  a  situation  is  that  depicted  in  Figure  6(b). 
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One  method  of  eliminating  this  situation  comes  to  mind  when 
one  considers  that  most  tables  or  directories  are  not  formed  by  starting 
with  an  empty  list  and  merely  adding  entries.   In  general  an  initial 
table  is  established  by  some  kind  of  declaration  and  this  table  is 
modified  by  later  adding  or  deleting  entries.   The  entries  thus 
initially  declared  should  be  established  in  the  table  in  a  best-case 
fashion.   A  practical  alternative  to  this  method  is  applicable  in 
cases  where  there  exists  previous  knowledge  as  to  the  usage  of  the 
initial  entries. 

In  such  a  situation,  the  approximate  frequencies  of  use 
(retrieval)  could  also  be  declared  and  this  information  would  be  used  to 
establish  an  optimum  BCP  table.   Such  a  table  will  minimize  S  over  the 
retrieval  of  all  entries  in  the  table  where  the  retrieval  of  each  entry 
is  weighted  accordingly.   Note  that  in  general  this  is  not  the  same  as 
a  best-case  table.   Regardless  of  the  initial  method  used  to  start  the 
table,  the  tree  representing  the  table  might  become  highly  asymmetric 
or  skewed  through  subsequent  additions  and/ or  deletions.   The  way  to 
determine  this  would  be  to  establish  a  measure  of  skewness  which  would 
be  calculated  periodically.   When  this  measure  exceeded  some  allowable 
limit,  a  garbage  collection  routine  could  take  over  and  restructure 
the  table.   Note  that  a  larger  well-structured  table  will  require  more 
modifications  than  a  smaller  one  before  becoming  adversely  skewed, 
hence  a  very  large  directory  need  be  examined  for  skewness  less  fre- 
quently than  a  smaller  one. 

A  simple,  although  not  necessarily  best,  measure  of  skewness 
is  merely  S  for  a  given  table.  This  can  then  be  compared  to  S  for  the 
best-case  condition  and  suitable  action  can  then  be  taken. 

Multiple  Keying 

hi 

A  method  suggested  by  Brooks  and  IversonL    for  a  particular 
type  of  binary-chop  table  is  also  applicable  to  the  BCP  table  --  namely 
multiple  keying.   In  the  BCP  table  this  is  equivalent  to  imagining  many 
columns  instead  of  a  single  column  for  each  entry  (i.e.,  for  a  single 
argument)  each  containing  a  key  and  a  set  of  pointers.   This  allows 
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searching  on   any  set  of  keys  and  can  be  thought  of  as  an  alternate 
indexing  scheme.   Note  that  it  is  also  possible  to  use  one  or  more  of 
the  keys  of  an  entry  as  the  actual  argument  in  some  cases.  An  example 
of  such  a  case  would  be  telephone  directory  information  where  the  sets 
of  keys  would  be  name,  address,  and  telephone  number,  thus  allowing  the 
retrieval  of  either  or  both  of  the  remaining  corresponding  keys  when 
searching  on  any  one  of  the  three. 

Segmentation 

When  using  a  BCP  table  with  multiple  keys,  it  is  not  always 
possible  to  obtain  a  system  of  segmenting  the  table  which  is  universal 
to  all  keys.   In  the  telephone  directory  example,  one  would  be  inviting 
trouble  to  suggest  that  the  table  can  be  broken  into  two  tables  by 
merely  inserting  people  with-  last  names  beginning  in  A-L  in  one  table  and 
to  M-Z  in  the  other.   Which  of  these  tables  does  one  look  in  when  one 
wants  to  retrieve  a  name  but  knows  just  a  telephone  number? 

One  special  feature  of  the  BCP  table  is  helpful  in  this 
respect:   If  we  always  require  null  nodes  when  deleting  items,  all 
pointers  in  the  table  point  down  in  the  table  (i.e.  away  from  the  first 
entry) .   The  entire  table  can  then  be  thought  of  as  one  long  continuous 
one,  where  different  segments  may  be  retrieved  and  searched  as  needed. 
In  general,  a  pointer  from  an  entry  in  a  segment  may  point  to  an  entry 
in  another  segment.   However,  because  these  pointers  only  point  down- 
ward, it  will  never  be  necessary  to  call  a  segment  into  memory  twice 
to  locate  an  entry.   Because  a  pointer  need  not  point  from  an  entry  in 
a  segment  to  an  entry  in  the  next  segment,  some  segments  might  be 
entirely  passed  over.   It  would  also  be  possible  to  structure  the  initial 
table  (at  declaration  time)  such  that  a  minimal  number  of  segments  need 
be  entered  (and  hence  retrieved). 
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Multiple  Entry  Pointers 

Previously,  the  top  node  of  the  tree  defining  a  table  was  the 
initial  entry  in  that  table  and  hence  this  entry  contained  the  first  key 
to  be  examined  in  any  search.   This  entry  can  be  chosen  at  the  initial 
declaration  of  the  directory  from  those  entries  thus  declared,  but  when 
multiple-keying  the  best  candidate  is  not  obvious.   (indeed,  a  best  can- 
didate may  not  exist  --  the  best  choice  with  respect  to  one  set  of  keys 
might  be  the  worst  with  respect  to  another.)   A  method  of  resolving  this 
is  to  assign  a  pointer  for  each  column.   In  the  case  where  the  table  is 
to  be  segmented,  there  need  only  exist  the  further  restrictions  that  each 
of  these  pointers  indicate  an  entry  in  the  first  segment  and  that  pointers 
not  in  this  first  segment  point  downward  as  before  i.e.,  the  pointers  in 
the  first  segment  may  point  up  or  down.   This  creates  no  problems  because 
this  first  segment  is  always  the  first  to  be  retrieved  and  there  are  none 
before  (higher  than)  it  which  could  be  recalled. 

Variable  Segment  Boundaries 

In  searching  through  a  segmented  table,  it  is  obviously 
advantageous  to  minimize  the  number  of  segments  which  must  be  retrieved. 
The  fact  that  pointers  only  point  downward  can  sometime  prove  useful 
here,  too,  depending  upon  the  configuration  of  the  table  in  tertiary 
storage . 

If  the  table  is  stored  contiguously  such  that  a  number  of 
records  comprise  each  segment,  the  segment  boundaries  may  be  varied  to 
optimize  the  transfer  of  data  (table  segments)  into  primary  storage. 
When  a  pointer  indicates  an  entry  which  is  outside  of  the  segment 
containing  that  pointer,  the  next  segment  to  be  retrieved  is  the  one 
starting  with  the  record  containing  the  new  entry  and  continuing  until 
the  proper  number  of  records  is  reached.   For  example,  suppose  that 
segments  are  made  up  of  four  records  each,  and  that  the  entire  table 
is  stored  contiguously.   If  a  pointer  in  the  first  segment  indicates 
an  entry  in  the  seventh  record,  the  second  segment  (consisting  of 
records  5-8)  need  not  be  retrieved.   Instead,  a  segment  consisting 
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of  records  7 -10  can  be  retrieved.   If  this  new  entry's  pointer  now 

routed  the  algorithm  to  an  entry  in  the  first  half  of  the  third  segment 

(records  9  and  10),  no  new  segment  need  be  immediately  retrieved. 
This  would  have  been  the  case  had  records  5-8  been  retrieved. 

Duplications 

A  whole  area  of  thought  is  opened  up  when  we  consider  the 
effects  of  having  in  any  column,  keys  duplicated. 

If  no  special  precautions  are  taken  we  will  have  an  equal 
compare  resulting  while  adding  an  entry  to  the  table.   Normally  this 
would  trigger  an  error  condition. 

The  simplest  solution  is  to  allow  a  duplicate  key  to  be 
considered  as  higher  than  the  equal  entry  already  in  the  table.   This 
inserts  duplicates  like  any  other  entry.   However  when  searching  to 
locate  an  element  one  would  have  to  search  all  the  way  to  the  bottom  of 
the  tree  to  determine  if  there  were  any  duplicate  entries.   A  partial 
solution  is  to  flag  entries  which  have  duplicates.   Then  one  need  only 
search  if  one  knew  a  duplicate  existed.   A  further  disadvantage  of  this 
system  is  that  to  reach  entries  lower  in  the  tree  one  will  in  general 
have  to  search  many  duplicates  --  a  time  wasting  procedure. 

The  best  system  is  to  provide  a  pointer  from  a  node  which  is 
being  duplicated  by  a  new  entry  to  that  new  entry.   If  an  entry  has  SAM 
as  a  key,  for  example,  a  third  pointer  is  provided  to  the  next  entry 
with  SAM  as  a  key  (the  HIP  and  L0P)  being  the  first  two.   The  first  SAM 
is  flagged  to  indicate  that  a  duplicate  exists,  or  the  presence  or  absence 
of  the  third  pointer  (referred  to  as  a  duplicate  pointer  --  DUP)  can  be 
tested.  -A  third  entry  with  SAM  as  a  key  will  simply  be  pointed  to  in  a 
sequential-pointer-linked  fashion  from  the  second  SAM-keyed  entry,  and 
so  forth.   To  allow  this  scheme,  there  must  be  storage  provided  (for  each 
column  if  multiple-keying)  in  each  original  (non-duplicate)  entry  to 
provide  space  for  a  possible  later  duplicate  pointer.   This  usually 
represents  a  fairly  expensive  use  of  storage. 
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A  scheme  which  overcomes  the  necessity  of  wasting  core  on 
possibly  unused  duplicate  pointers  is  the  following.   The  second  SAM -keyed 
entry  has  its  HIP  set  to  point  to  the  entry  which  the  first  SAM-keyed 
HIP  pointer  indicated.   This  HIP  pointer  in  the  original  SAM-keyed  entry 
indicates  the  new  entry.   The  L0P  of  the  new  entry  can  be  used  for  the 
first  pointer  in  a  chained  set  to  the  third  and  higher-numbered  SAM-keyed 
entries  (see  Figure  7)-   A  draw-back  of  this  system  is  that  the  pointer 
from  the  second  SAM-keyed  entry  to  the  entry  indicated  by  the  original 
HIP  pointer  may  be  pointing  physically  upward  in  the  table.   If  seg- 
mentation is  being  used,  this  is  unacceptable. 

If  both  the  boundary  conditions  above  are  in  force  (segmenta- 
tion, no  space  for  DUP  pointers)  as  may  happen  in  large  files,  a  solu- 
tion may  be  achieved  at  the  expense  of  some  search  time.   When  a  new 
entry  duplicates  an  old,  the  old  entry  is  flagged  and  the  new  entry 
entered  in  a  BCP  table  containing  only  duplicate  entries.   Space  is  left 
in  this  table  for  DUP  pointers.   One  must  search  both  tables  to  locate 
a  duplicate.   But  hopefully  this  duplicate  table  will  be  small  relative 
to  the  main  table  and  both  of  these  draw-backs  will  be  minimal. 

If  one  is  multiple-keying,  an  entry  may  duplicate  an  existing 
key  in  one  column  and  be  a  new  key  in  a  second.   One  cannot  store  the 
entry  in  two  different  tables  at  once,  unless  they  are  Interleaved. 
This  scheme  necessitates  having  a  pointer  to  the  top  node  of  the  duplicate 
table  in  each  column,  as  these  tables  will  be  made  up  of  keys  from 
different  entries  in  each  column.   The  following  table  illustrates  this 
concept  of  a  multiple-keyed  table  with  duplicate  keys  which  is  capable 
of  being  segmented. 
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The  keys  to  be  searched  here  are  name  and  age: 


Loc. 

Key 

LjfrP 

HIP 

DUP^ 

Loc . 

Key 

LjftP 

HIP 

f 

DUP' 

1 

MAC 

3 

5 

2 

* 

22 

8 

6 

3 

Dj6W 

13 

T 

-*3 

22 

2U 

22 

10 

5 

TIM 

9 

19 

6 

29 

12 

16 

7 

JIM 

8 

!9 

Ik 

18 

9 

R0N 

15 

10 

22 

>ll3 

JIM 

23 

21 

12 

27 

13 

B/)B 

IT 

Ik 

IT 

15 

SAM 

16 

32 

26 

IT 

* 

AL 

18 

* 
20 

20 

19 

TfM 

20 

21 

21 

JIM 

25 

22 

27 

23 

AL 

2U 

20 

25 

JIM 

26 

33 

*indicates  that  a  key  has  been  flagged  to  show  that  it  has  been 
duplicated 

Notes: 

1  The  locations  here  appear  in  sequential  order  only  to  simplify 
the  example.  In  actual  practice,  entries  may  occupy  any  fixed 
size  of  memory,  depending  on  length  of  keys,  pointers,  etc. 

2  Space  for  the  DUP  is  not  left  for  entries  in  which  none  appears, 
but  only  in  those  entries  in  which  one  exists  in  the  table. 

3  The  pointers  for  entering  the  table  to  search  on  the  two  keys, 
name  and  age,  are  assumed  to  be  set  at  locations  1  and  2  re- 
spectively.  The  arrows  indicate  the  pointers  which  reference 
the  first  entries  in  the  duplicate  table. 


-22* 


If  the  added  device  of  entering  the  key  being  duplicated 
in  the  duplicate  table  as  well  is  used  when  one  desires  to  search  for 
a  key  which  one  knows  to  be  duplicated,  one  can  search  the  duplicate 
table  first,  bypassing  the  (hopefully  larger)  search  on  the  main  table. 
As  one  cannot  recopy  keys  in  other  columns  which  are  not  duplicates, 
one  only  copies  one  column  into  the  duplicate  table  and  uses,  as  its 
argument,  a  pointer  (admittedly  upward)  to  the  original  entry.   As  this 
upward  pointer  is  used  only  when  the  final  entry  has  been  retrieved, 
we  will  have  at  most  one  segment  of  table  to  recall. 
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Summary 

The  Binary-Chop  Pointer  directory  scheme  seems  to  have  consider- 
able promise.   The  authors  have  noted  several  possibilities  and  have  given 
suggestions  for  incorporating  same  into  a  computer  system.   Their  results 
are  by  no  means  exhaustive,  but,  rather,  merely  suggestive  of  the  possi- 
bilities of  a  little  explored  method. 
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